NMR Observation of Sulfhydryl Signals in SARS‐CoV‐2 Main Protease Aids Structural Studies

Abstract The 68‐kDa homodimeric 3C‐like protease of SARS‐CoV‐2, Mpro (3CLpro/Nsp5), is a key antiviral drug target. NMR spectroscopy of this large system proved challenging and resonance assignments have remained incomplete. Here we present the near‐complete (>97 %) backbone assignments of a C145A variant of Mpro (Mpro C145A) both with, and without, the N‐terminal auto‐cleavage substrate sequence, in its native homodimeric state. We also present SILLY (Selective Inversion of thioL and Ligand for NOESY), a simple yet effective pseudo‐3D NMR experiment that utilizes NOEs to identify interactions between Cys‐thiol or aliphatic protons, and their spatially proximate backbone amides in a perdeuterated protein background. High protection against hydrogen exchange is observed for 10 of the 11 thiol groups in Mpro C145A, even those that are partially accessible to solvent. A combination of SILLY methods and high‐resolution triple‐resonance NMR experiments reveals site‐specific interactions between Mpro, its substrate peptides, and other ligands, which present opportunities for competitive binding studies in future drug design efforts.

Cell pellets were resuspended in 25 mL of Buffer A (20 mM Tris pH 7.5, 150 mM NaCl, 20 mM Imidazole, 0.5 mM TCEP) supplemented with one tablet of cOmplete protease inhibitor cocktail (Roche). The cell suspension was lysed on ice by sonication (12 cycles of 30s on, 30s off) and the insoluble fraction was removed by centrifugation at 19,000 rpm for 1 hour at 4 °C (Beckman Coulter Avanti centrifuge JA-25.50). The soluble fraction was loaded (2 mL/min) onto two concatenated 5 ml HisTrap HP columns (GE Healthcare) pre-equilibrated in Buffer A using a 50 mL SuperLoop (GE Healthcare). After loading, the column was washed with 60 mL Buffer A; then the protein was eluted using a step gradient to Buffer B (20 mM Tris pH 7.5, 150 mM NaCl, 300 mM Imidazole, 0.5 mM TCEP). The peak elutes across 3 x 6 mL fractions which were then pooled and diluted to 3 mg/mL protein (using A280 = 1.2 × 10 3 for 1 mg/mL).
To this solution, pre-dissolved 1M dithiothreitol (DTT) was added to a final concentration of 20 mM, with gentle swirling to mix the two solutions. Next, degassed Buffer C (50 mM Tris pH 8, 1.5 M Guanidine HCl, 10% glycerol) was added in a 6:4 ratio, and incubated for 4-48h at room temperature without shaking in a 100 mL glass Duran sealed under N2 (g) (a method adapted from the approach used for PETNR [3] ). Extensive reprotonation was observed after 4 hours, with additional reprotonation observed for a subset of resonances after extending the incubation period to 48 hours. However, numerous buried residues in required an additional procedure to achieve full back exchange (see below). The reaction mix, typically ca. 60 mL was added to a 70 mL Slide-A-Lyzer dialysis cassette (10 kDa cutoff; Thermo Scientific), followed by dialysis against 4L (N2(g) degassed) Buffer D (PBS supplemented with 0.5 mM TCEP) for 2h at room temperature to remove the majority of the GuHCl. Subsequently, 0.5 mg of TEV protease and 10 mM DTT was added to the dialysis cassette and the reaction mixture was dialyzed against a fresh 4L of N2 degassed Buffer D overnight at 4 °C (TEV expressed in-house).
The reaction mixture was then loaded back onto the cleaned (Buffer B; supplemented with 8M urea) HisTrap column preequilibrated in Buffer A. The flow through was collected and concentrated to ca. 10 mL using a 10 kDa MWCO Amicon centrifugal concentrator, before being loaded onto a Hiload 26/60 Superdex G75 size-exclusion column previously washed with 1 column volume of 1 M NaOH and equilibrated in Buffer E (10 mM NaPi pH 7.0, 0.5 mM TCEP). Fractions containing M pro C145A were checked for purity using SDS-PAGE and pooled together, before being concentrated to ~ 1 mM by Amicon centrifugal concentrator (10 kDa MWCO) and flash frozen in liquid nitrogen for storage at − 80 °C. Protein concentrations were estimated by absorbance at 280 nm (ε280 = 32890 M −1 cm −1 ) which assumed all cysteines were reduced.
All chemical reagents were of analytical grade and were purchased from Sigma-Aldrich (USA), CortecNet (USA), or Fischer.

Full back exchange of amide protons
Despite the presence of roughly the expected number of backbone amide peaks in the TROSY-HSQC spectrum, initial assignments revealed that ca. 40 residues were missing in buried β-sheet regions, as well as in the C-terminal α-helical domain, whereas residues in the linker between domains II and III showed duplicate resonances. This indication of incomplete back exchange was later confirmed when an improved back-exchange protocol was developed for this protein. This method was adapted from the approach taken for another slowly back-exchanging globular protein, PETNR [3] , with minor modifications to adapt it for M pro . Initial work characterizing the 2002 variant of M pro [4] indicated that the 0.75M GuHCl would be sufficient to substantially unfold the catalytic N-terminal domain of the monomer, but would be insufficient to unfold the helical C-terminal domain (in our hands 0.9 -1.1 M was the highest GuHCl that could be used, but at the top of this range, a substantial increase of protein precipitation was observed). Using these observations, and the presence of the solubility-tag / IMAC-tag at the N-terminus, an unfolding -refolding step was introduced into the middle of the purification protocol. It should be noted that, unfolding of final purified protein product was observed to be effectively irreversible under standard conditions and temperatures, as were unfolding attempts using > 1.1M GuHCl and pH values over 8, as DTT and TCEP nolonger protect the 11 remaining Cys residues. Moreover, attempts to refold the fusion construct on the His-trap column were unsuccessful, likely limited by the permissible concentration of reducing agent and high effective protein concentration on the column.
The optimized protocol involved a step-elution from the IMAC column to retain a relatively high concentration (frequent elution at ca. 5 mg/mL) to avoid the need for a concentration step. A similar unfolding-refolding protocol was employed as for PETNR [3] , namely, the protein was diluted to 2 mg/mL at pH 8 and slowly added 1:1 to an equal volume of buffered 2.0 M GuHCl -to a total volume of ca. 70 mL. This mixture (both initial buffers containing 10-20 mM DTT) was incubated for 2d at room temperature in a glass Duran. Rather than snap-refolding the back-exchanged protein by dilution (as for PETNR [3] ; necessitating extensive concentration), we found that dialysis at room temperature for 2h in a 70mL dialysis cassette resulted in minimal protein precipitation. At this point, TEV protease was added to this dialysis cassette for the proteolytic cleavage of the N-terminal solubility/affinity tag overnight at 4 ºC (see methods for full details). Retention of the N-terminal solubility tag (unaffected by the increase in GuHCl concentration) for the unfolding step was important, preventing the partially unfolded peptide population from substantive aggregation (<1% observed). Following the cleavage reaction, the reaction mixture was passed back down an IMAC column which removed the TEV and (small population of) uncleaved M pro C145A fusion construct. The following concentration and gel filtration / buffer exchange steps are common practice (see methods). Nearly complete back exchange of backbone amides was indicated by the presence of many new peaks in the TROSY NMR spectrum ( Figure S1), and was further substantiated on collection of triple resonance backbone assignment spectra and NOE spectra. Several very weak, additional peaks observed in the reprotonated spectrum ( Figure S1) correspond to minor species that may result from the relatively harsh conditions during the reprotonation protocol. Their origin has not been further investigated. In both the main text and the Supporting Information, distinction is made between samples that either were (reprotonated) or were not (not-reprotonated) exposed to this partial refolding step, with samples being reprotonated unless otherwise stated. Because the reprotonation protocol was only developed towards the end of our study of M pro , spectra obtained during the early phase of our study were recorded on non-reprotonated protein. The absence of chemical shift perturbation between spectra of reprotonated and non-reprotonated samples indicates that the reprotonation protocol does not impact the structure or properties of the protein.

NMR experiments
Backbone assignment spectra were acquired on 1.1 mM 2 H, 15 N, 13 C-labelled M pro C145A samples in NMR buffer (10 mM NaPO4, 0.5 mM TCEP buffer (pH 7.0) supplemented with 2 H2O (3% v/v) and 0.3 mM sodium 3-(trimethylsilyl)propane-1-sulfonate (DSS)), but sample conditions are often repeated in figure captions for convenience. All assignment experiments were recorded at 298K unless otherwise stated. TROSY versions of the standard HNCO, HN(CA)CO, HNCA, and HNCB experiments [5] were recorded on a 700 MHz Bruker Avance III spectrometer equipped with a 5-mm TCI probe containing triple-axis gradients and running TopSpin software version 3.2. Additionally, 1 H NOESY-TROSY experiments were acquired on a 900 MHz (800 MHz) Bruker spectrometer with a Neo (Avance III) console, fitted with a 5-mm TCI probe equipped with single-axis (triple-axis) gradients and running TopSpin software version 4.1 (3.1).
Backbone 1 H N , 15 N, 13 C', 13 C α , and 13 C β chemical shifts were assigned for substrate-free M pro C145A using the standard triple resonance methodology (Gardner and Kay 1998). Spectra were processed in NMRPipe, [6] peak picking was performed in SPARKY [7] and POKY. [8] Frequency matching of the backbone assignments was achieved in an iterative manner, using FLYA [9] and manual checking in both Sparky and NMRView. SPARTA+ [10] was used to generate chemical shift predictions from PDB entry 6Y84 which were used by FLYA to guide the assignment strategy. SPARTA+ predicted chemical shifts were also used to cross-check the final assignments, along with comparison to the equivalent residues in the previously deposited BMRB entries (BMRB entries 17251 [11] and 17911).
The backbone 1 HN, 15 N, 13 C', 13 C α , 13  Programs for visualization and analysis were written using freely available python libraries, [12] as well as NMR-specific python libraries. [13]

AlphaFold-Multimer calculations
The installation of AlphaFold-2, [14] previously used by us, [15] was modified according to the protocol outlined by Evans et al. [16] and using an available online resource: https://github.com/jcheongs/alphafold-multimer (date retrieved: 2022/06/22). The full implementation of AlphaFold-Multimer (AF-M) was used (including Amber [17] forcefield relaxation, and without sequence restriction). AF-M calculations were performed for both (monomeric; 3h) M pro :SAVLQSGFRK and (dimeric; 15h) 2M pro :SAVLQSGFRK complexes, on a modest graphics card (NVIDIA RTX1080), and the predicted structures (SI Fig. S15) were close to PDB entry 2Q6G (SI Fig. S14), with non-H atom RMSD values for M pro (monomer; dimer) of (0.51Å; 0.84Å), and substrate (1.7Å; 3.9Å). The same process was repeated for VyLQ substrate analog. AlphaFold reports per-residue predicted local-distance difference test (pLDDT) scores that quantify confidence in local structure prediction, [14] which correlates with solution structure accuracy. [15,18] pLDDT scores are consistently high across the monomeric and dimeric M pro models, however only the dimeric M pro complex confidently predicts the N-and C-termini which occupy the inter-monomer interface. pLDDT scores for the decapeptide are high for pockets S4-S1' in the monomer, with a relatively tight clustering of binding poses across the AF-M predictions (SI Fig. S15). However, both clustering and pLDDT confidence scores deteriorate in the dimeric complex (SI Fig. S15). The final step of AF-M structure prediction is an energy minimization using Amber, which requires a fully protonated protein. Experimentally, proton positions are only accessible using exotic neutron-scattering experiments, rendering X-ray structures devoid of protons and requiring post-hoc software solutions for accurate proton placement (eg. reduce, [19] or DYNAMO [20] ). Consequently, the comparison between AF-M and X-ray should be restricted to non-H atoms.

Backbone Assignment Strategy
Several issues emerged due to repeated peak usage by the FLYA algorithm, so the keepassigned=true option was used in conjunction with a N15HSQCassigned.peaks reference shift file in order to transition to a semi-automated assignment protocol. Two states were observed with ca 9:1 intensity ratio in the TROSY-HSQC spectrum, centered around Pro184. Inspection of the 13 C β (i-1) peak of F185 revealed that the two states were due to cis-trans isomerization of P184, as evidenced by a substantial downfield 13 C β shift which is highly diagnostic. Increased flexibility of residues in the vicinity of P184 caused many of the minor cis conformer resonances to have stronger intensities than amides in other regions of the protein that were impacted by exchange broadening.
Given the relatively large range of intensities observed for M pro , this weaker Minor form therefore complicated the initial assignment, and precluded the effective use of many common assignment packages. Minor states were manually sought, identified, and excluded from the pool of spin systems, leading to a final coverage of 97.3% of all backbone resonances (97.3% 1 HN, 97.3% 15 N, 93.5% 13 C', 98.7% 13 Cα, 86.6% 13 Cβ nuclei), with 20 residues displaying a minor conformer. The missing residues are largely localized to the active site.

Chemical shift perturbation
In keeping with other approaches to combine chemical shift perturbations from multiple incomplete sets of heteronuclei, [21] we used the following equation to quantify the overall chemical shift perturbation (CSP): where for a given residue, N is the number of atoms available for comparison, αi is the RMS of secondary shifts in folded proteins in the SPARTA+ database, [10] which are 1.04, 1.16, 1.14, 2.56 and 0.54 ppm for 13 C α , 13 C β , 13 C', 15 N and 1 H N , respectively. Figure S1: Location of residues that require unfolding for full reprotonation of 2 H 15 N-M pro C145A in 10 mM NaPi pH 7.0 buffer supplemented with 0.5 mM TCEP and 0.3 mM DSS for chemical shift referencing. (A) Overlay of fully reprotonated (black) and non-reprotonated (red) 1 H 15 N-TROSY-HSQC spectra. Asterisks denote sidechain peaks that were aliased, while the backbone amide peak of A260 is aliased from 136.2 ppm and is fully reprotonated without unfolding. For improved visualization of overlayed resonances in both the reprotonated and non-reprotonated spectra, slightly more aggressive apodization in the 1 H dimension for the non-reprotonated spectrum was used. Sidechain amide resonances of Gln and Asn that are incompletely suppressed by the TROSY pulse scheme are shown in green. (B) Illustration of the M pro homodimer with slowly reprotonating residues (time constant > 3 months) colored orange, with separate monomer chains colored grey and blue for clarity, and the catalytic dyad of C145A and H41 colored magenta. (C) A single-chain representation of (B).  (red) and M pro H41Q (violet) with annotation of residues that either display moderate to large chemical shift perturbation between the two enzyme variants, or residues that are substantially sharper in the M pro H41Q variant due to modulation of conformational exchange processes in the active site such as R40, Q41, and I43. Figure S4: Cross section of the HNCB spectrum for residue F185, where the major and minor species are readily observable (exponential contour spacing separated by factors of 1.4). The difference in P184 C β chemical shift is 5.5 ppm, which is demonstrative of cis-trans isomerism in solution. These two species are clearly in slow-exchange on the NMR timescale which is common for proline isomerism in proteins. Figure S5: Comparison of chemical shift differences between the assignments of M pro C145A and M pro H41Q at 308K using the method of Williamson. [21] (A-E) Correlation plots of secondary-shift perturbations for backbone 1 H (A), 15 N (B), 13 C' (C), 13 C α (D), 13 C β (E) nuclei. (F) Chemical shift perturbations (CSPs; see Methods section) were plotted on the X-ray structure of M pro -PDB 5R8T -with missing residues colored in grey, and the catalytic dyad H41 and C145 shown in magenta. (G) A bar chart of CSP by residue with key residues annotated, and with active site regions shaded light blue. Secondary structure elements extracted from PDB 5R8T are annotated above the figure, with the three separate globular domains also indicated.  13 C' (C), 13 Cα (D), 13 Cβ (E) nuclei. (F) Chemical shift differences (CSPs; see Methods section) plotted on the X-ray structure of M pro -PDB 5R8T -with missing residues colored in grey, and the catalytic dyad H41 and C145 shown in magenta. Residues 1-199 were compared for assignments at 308K, residues 200-306 were compared to the M pro C145A assignment at 298K. Standard C α and C β deuterium isotope shifts were subtracted from M pro C145A resonances prior to comparison. [10] (G) A bar chart of CSP by residue with active site regions shaded light blue. Secondary structure elements extracted from PDB entry 5R8T are annotated above the figure, with the three separate globular domains also indicated. A dashed line at 0.8 indicates the threshold for coloring points in subplots A-E by magnitude of CSP (using the same color ramp as subplot F).

Figure S7:
Comparison of chemical shift differences between the assignments of M pro C145A (10 mM NaPi pH 7.0, 0.5 mM TCEP, 0.3 mM DSS, 3% D2O), and the partial M pro assignment BMRB 50780 (50mM NaPi pH 6.8, 40mM NaCl, 0.1 mM EDTA, 3mM THP (Tris(hydroxypropyl)phosphine), 5% D2O). [22] (A-E) Correlation plots of secondary shifts for backbone 1 H (A), 15 N (B), 13 C' (C), 13 C α (D), and 13 C β (E) nuclei. Residues with chemical shift perturbations (CSPs; see Methods section) greater than 0.5 (grey dashed line) are colored according to (F) -and plotted on the X-ray structure of M pro -PDB entry 5R8T with missing residues shown in grey, and the catalytic dyad H41 and C145 shown in magenta. Standard C α and C β deuterium isotope shifts were subtracted from M pro C145A resonances prior to comparison. [10] (G) A bar chart of CSP by residue with key residues annotated, and with active site regions shaded light blue. Secondary structure elements extracted from PDB entry 5R8T are annotated above the figure, with the three separate globular domains also indicated.          Table S6 for details). The spectrum was reconstructed using SMILE [24] with 50% extension of indirect dimensions. NOESY peaks were automatically picked and curated using Sparky and used to both aid the automated assignment of the M pro C145A:SAVLQSGFRK complex, and validate assignments manually using NMRDraw and the scroll.tcl macro. Missing assignments are annotated.  . NUS data (32% sampled) were acquired using 38 ms indirect 13 C α chemical shift evolution (States-TPPI) and a 50 ms 15 N chemical shift evolution (Echo-AntiEcho), with 2 transients per increment, and a TROSY readout. 13 C α non-uniform sampling comprised fully (randomly) sampling data points up to 7ms (38ms), with 2 H decoupling achieved using a WALTZ16 scheme. [25] The spectrum was reconstructed using SMILE [24] with 50% extension of the indirect dimensions, employing virtual decoupling of the 1 JCαCβ couplings, using a similar approach to the method published by Kazimierczuk et. al [26] . HNCA peaks were automatically picked and curated using Sparky and used to both aid the automated assignment of the apo-M pro C145A complex and validate assignments manually in NMRDraw using the scroll.tcl macro. Missing assignments are annotated.